Text generation method and text generation device

ABSTRACT

A text generation method and a text generation device generate a meaningful text even though the number of input keywords is insufficient. In order to achieve this object, keywords are input to the text generation device. In the text generation device, a Bunsetsu-generation-rule acquisition unit acquires a Bunsetsu generation rule from a corpus and a Bunsetsu-candidate generation unit generates Bunsetsu candidates from the keywords. A text candidate generation unit generates text candidates with assuming a dependency relation among the Bunsetsu candidates. An evaluation unit evaluates the text candidates and outputs text in accordance with the evaluation.

TECHNICAL FIELD

The present invention relates to a method and a device for processingnatural language and, in particular, to a method for generating textfrom several keywords.

BACKGROUND ART

Text generation is one of the elemental technologies used for a varietyof natural language processing applications, such as a machinetranslation, summarization, and dialogue systems. Recently, many corporahave become available, and therefore, these corpora are used forgenerating natural text. One typical example is a language model usedfor machine translation, which translates a source language to a targetlanguage.

For example, Japanese Patent Application No. 2001-395618 by the presentinventors discloses a text generation system in which replaced words andphrases in a target language are ordered in a sequence having the mostlikelihood so as to generate the target language. In general, an inputto a language model is a word set. The function of the language model isprimarily to sort these words.

Such a known system assumes that sorting input words in a word set cangenerate natural language text. That is, a word set for generatingnatural text must be given by a translation model without excess andshortages.

However, this assumption requires a large translation corpus. Even whenthe Japanese language, which has a relatively excellent corpus, is thesource language, the above-described known method sometimes cannotprovide a satisfactory text generation, depending upon the status of thetranslation corpus of the target language and the corpus of the targetlanguage.

Additionally, although the above-described patent document cancomplement some words, it is only supplementary and cannot efficientlycomplement the associated words.

This problem is not limited to machine translation. In general, theproblem occurs in any text generation. Similarly, if a source languagetext is not complete, that is, if the source language text is a resultof erroneous OCR recognition or erroneous speech recognition, accuratetext generation cannot be obtained, which is a problem.

DISCLOSURE OF INVENTION

Accordingly, the present invention provides a method and a device forgenerating a meaningful text when there are insufficient input keywords.

To solve the above-described problem, the present invention disclosesthe following text generation method.

That is, the text generation method for generating text of a sentence orparagraph includes an input step in which one or more words are input askeywords.

The text generation method further includes a character-unit candidategeneration step for generating character unit candidates from thekeywords; a text candidate generation step for generating textcandidates with assuming a dependency relation among the characterunits; an evaluation step for evaluating the text candidates; and anoutput step for outputting at least one of the evaluated textcandidates.

In the input step, a word having a dependency relation to an input wordmay be input as an additional keyword using a database of thepredetermined language in order to complement insufficient keywords.

In the character-unit candidate generation step, a character stringrelated to at least one of the keywords may be added before or after akeyword so as to generate character unit candidates. In this case, acharacter string may be added in the same manner or need not be addedfor all of the other keywords so as to generate character unitcandidates.

The text generation method according to present invention may furtherinclude, after the input step, an extraction step for extracting asentence or phrase containing the keywords from the database; and ageneration rule acquisition step for automatically acquiring acharacter-unit candidate generation rule from the extracted sentence orphrase, wherein the character unit candidates may be generated using thegeneration rule in the character-unit candidate generation step. Thedatabase may be a corpus of the language of the text generated in thepresent invention, for example, the Japanese language.

Additionally, in the generation rule acquisition step, the sentence orphrase extracted in the extraction step may be analyzed by using amorphological analysis and/or a syntactic analysis, and an analyzedcharacter unit containing the keywords may be used as a generation rule.

Furthermore, the present invention can provide a text generation devicefor generating text of a sentence or paragraph in a predeterminedlanguage.

The device includes input means for inputting one or more words askeywords; character-unit candidate generation means for generatingcharacter unit candidates from the keywords; text candidate generationmeans for generating text candidates with assuming a dependency relationamong the character units; evaluation means for evaluating the textcandidates; and output means for outputting at least one of theevaluated text candidates.

Herein, in the input means, a word having a dependency relation to theinput word may be input as an additional keyword using a database of thepredetermined language.

Additionally, in the character-unit candidate generation means, acharacter string related to at least one of the keywords may be addedbefore or after a keyword, and for all of the other keywords, acharacter string may be added in the same manner or need not be added soas to generate character unit candidates.

The text generation device according to the present invention mayfurther include extraction means for extracting a sentence or phrasecontaining the keywords from the database; and generation ruleacquisition means for automatically acquiring a character-unit candidategeneration rule from the extracted sentence or phrase, wherein thecharacter unit candidates may be generated using the generation rule inthe character-unit candidate generation means.

In the generation rule acquisition means, the sentence or phraseextracted in the extraction means may be analyzed by using amorphological analysis and/or a syntactic analysis, and an analyzedcharacter unit containing the keywords may be used as a generation rule.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram for explaining a text generation device according tothe present invention.

FIG. 2 is a block diagram of the text generation device according to thepresent invention.

FIG. 3 shows an example for explaining text generation from keywords.

FIG. 4 is a diagram for explaining a relationship between a keyword andother words.

FIG. 5 is a block diagram of a dependency-related-word extraction unitaccording to the present invention.

Reference Numerals  1 text generation device  2 keyword  3 text  4Bunsetsu-generation-rule acquisition unit  5 Bunsetsu-candidategeneration unit  6 text candidate generation unit  7 evaluation unit  8corpus  9 Bunsetsu generation rule 10 keyword generation model 11language model

BEST MODE FOR CARRYING OUT THE INVENTION

A preferred embodiment of the present invention will be described belowwith reference to the accompanying drawings. However, the presentinvention is not limited to the following embodiment, and can beappropriately modified.

FIG. 1 is a diagram for explaining a text generation device (1)according to the present invention (hereinafter referred to as simplythe device (1)). As the simplest function of the device (1), when thedevice (1) receives three keywords (2), for example, “KANOJO”, “IE”, and“IKU”, generates texts such as a text “KANOJONOIENIIKU” (3 a) and text“KANOJOGAIENIITTA” (3 b).

An example of the device (1) includes units shown in FIG. 2. That is,the device (1) is composed of, for example, a personal computer thatincludes a CPU, a memory, and an external storage medium, such as a harddisk. The CPU carries out the main process and stores the processingresult in a RAM and the external storage medium as needed.

According to the present invention, keywords (2) are input from akeyboard connected to the personal computer, or data output from anotherlanguage processing system may be used as the keywords (2).

In this embodiment, the keywords (2) are used by two processing units.One is a Bunsetsu-generation-rule acquisition unit (4), and the other isa Bunsetsu-candidate generation unit (5).

Herein, the target language is Japanese, and a Bunsetsu is generated asa character unit. Moreover, Bunsetsus are minimal linguistic unitsobtained by segmenting a sentence naturally in terms of semantics andphonetics, and each of them consists of one or more morphemes. Thekeyword is defined as a head-word of the Bunsetsu. The word that becomesa head-word of the Bunsetsu is a content word nearest to the end of asentence. As used herein, content refers to a direction word of amorpheme whose part of speech is a verb, an adjective, a noun, ademonstrative, an adverb, a conjunction, an adnominal, an interjection,or an undefined word. Other direction words of morphemes are consideredto be function words. However, SA-hen verbs, the verb “NARU”, and theformal noun “NO” are considered to be function words except when othercontent words do not exist in the Bunsetsu. The classification of thepart of speech follows that in the Kyoto University corpus version 3.0(Kurohashi and Nagao, 1997).

The Bunsetsu-generation-rule acquisition unit (4), upon receipt ofkeywords “KANOJO”, “IE”, and “IKU”, searches for sentences containingeach keyword from a corpus (8), and carries out a morphological analysisand a syntactic analysis (dependency analysis). Thereafter, theBunsetsu-generation-rule acquisition unit (4) extracts Bunsetsucontaining the keywords (2), acquires Bunsetsu generation rules (9) fromthe Bunsetsu, and stores them. For example, the Bunsetsu generationrules (9) for generating a Bunsetsu from a keyword include“KANOJO”→“KANOJONO”, “IE”→“IENI”, “IKU”→“IKU”, and “IKU”→“ITTA”.

Herein, the generation rules are automatically acquired by using thefollowing technique. When a set of keywords is V and a set of rules forgenerating a Bunsetsu from a keyword k (εV) is R_(k), a rule r_(k)(εR_(K)) is expressed as:k→h_(k)m*where h_(k) is a head morpheme and m* is any number of consecutivemorphemes subsequent to h_(k). When keywords are given, theBunsetsu-generation-rule acquisition unit (4) automatically acquiresrules to satisfy the above-described form from a monolingual corpus.

On the other hand, the Bunsetsu-candidate generation unit (5) generatescandidates of Bunsetsus that constitute text (3) from the input keywords(2), while referring to the Bunsetsu generation rules (9).

For example, it is difficult for “KANOJO” to be a part of a naturaltext. Therefore, some phrase that has a close relation to the word“KANOJO” is added to be “KANOJONO” or “KANOJOGA”, which is used for thesubsequent process of text generation.

As described in this embodiment, by the Bunsetsu-generation-ruleacquisition unit (4) generating Bunsetsu generation rules of the inputkeywords (2) from the corpus (8), the Bunsetsu generation rules areefficiently generated with a minimum amount of calculation, thusincreasing the processing speed.

According to the present invention, although phrases associated with thekeywords (2) are extracted from a corpus, any phrases may be addedbefore and after each keyword (2) without using a corpus, depending onthe computational power.

According to the present invention, since an evaluation unit (7), whichwill be described below, precisely evaluates candidates of a Bunsetsueven though any phrase is added, a Bunsetsu candidate having the highestevaluation value can be generated.

Subsequently, a text candidate generation unit (6) generates candidatesof the text. The candidates of the text are represented in the form of agraph or a tree.

That is, as shown in FIG. 3, assuming that a dependency relation existsamong the Bunsetsu candidates 4 a to 4 f, text candidates, such as atext candidate 1 (12) and a text candidate 2 (13), are generated in theform of a dependency structure tree in which each Bunsetsu is a node.

At this moment, the text candidates are generated in the form of adependency structure tree so as to satisfy the following conditions:

(i) The dependency is directed from a preceding element to a followingelement (left to right modification).

(ii) The dependencies do not cross each other (cross dependencyrestriction).

(iii) A modifying element has only one modified element.

For example, in the case of three keywords, assuming that Bunsetsucandidates containing the keywords are b1, b2, and b3, if the order isfixed, two text candidates (b1 (b2 b3)) and ((b1 b2) b3) are generated.If the order is not fixed, sixteen text candidates are generated.

The generated text candidates, for example, the text candidates (12) and(13), are ordered by an evaluation unit (7) using a keyword generationmodel (10) and a language model (11) that have learned from the corpus.

The keyword generation model (10) and the language model (11) will bedescribed below as a morpheme model and a dependency model,respectively.

As the keyword generation model, models (KM1 to KM5) are consideredusing the following five types of information as primitives.Hereinafter, it is assumed that a keyword set V is a set of head wordsthat appear more than a predetermined number of times in the corpus andBunsetsus are represented as described above. Also, the keywords areindependent. If a given text has a word sequence w1 . . . wm, a keywordki corresponds to a word wj (1≦j≦m).

[KM1]

The Preceding Two Words are Considered (Trigram):

It is assumed that Ki is dependent on only the preceding two words wj-1and wj-2.

$\begin{matrix}{{P( {{K❘M},D,T} )} = {\prod\limits_{i = 1}^{n}{P( {{k_{i}❘w_{j - 1}},w_{j - 2}} )}}} & \lbrack {{Equation}\mspace{20mu} 1} \rbrack\end{matrix}$[KM2]The Subsequent Two Words are Considered (Backward Trigram):

It is assumed that Ki is dependent on only the subsequent two words wj+1and wj+2.

$\begin{matrix}{{P( {{K❘M},D,T} )} = {\prod\limits_{i = 1}^{n}{P( {{k_{i}❘w_{j + 1}},w_{j + 2}} )}}} & \lbrack {{Equation}\mspace{20mu} 2} \rbrack\end{matrix}$[KM3]Modifying Bunsetsus are Considered (Modifying Bunsetsu):

It is assumed that, if Bunsetsu that modify the Bunsetsu containing Kiexist, Ki is dependent on only two words w1 and w1-1 in the Bunsetsunearest to the end of the sentence (refer to FIG. 4).

$\begin{matrix}{{P( {{K❘M},D,T} )} = {\prod\limits_{i = 1}^{n}{P( {{k_{i}❘w_{l}},w_{l - 1}} )}}} & \lbrack {{Equation}\mspace{20mu} 3} \rbrack\end{matrix}$[KM4]A Modified Bunsetsu is Considered (Modified Bunsetsu):

It is assumed that, if a Bunsetsu that is modified by the Bunsetsucontaining Ki exist, Ki is dependent on only two head words ws and ws+1in the modified Bunsetsu (refer to FIG. 4).

$\begin{matrix}{{P( {{K❘M},D,T} )} = {\prod\limits_{i = 1}^{n}{P( {{k_{i}❘w_{S}},w_{S + 1}} )}}} & \lbrack {{Equation}\mspace{20mu} 4} \rbrack\end{matrix}$[KM5]At Least Two Modifying Bunsetsus are Considered (Two ModifyingBunsetsus):

It is assumed that, if Bunsetsus that modify the Bunsetsu containing Kiexist, Ki is dependent on only the last two words w1 and w1-1 in theBunsetsu nearest to the end of the sentence and the last two words whand wh-1 in the Bunsetsu nearest to the beginning of the sentence (referto FIG. 4).

$\begin{matrix}{{P( {{K❘M},D,T} )} = {\prod\limits_{i = 1}^{n}{P( {{k_{i}❘w_{l}},w_{l - 1},w_{h},w_{h - 1}} )}}} & \lbrack {{Equation}\mspace{20mu} 5} \rbrack\end{matrix}$

The morpheme model (MM) will be described next. Herein, it is assumedthat there are l grammatical properties, one of which is assigned to amorpheme. A model used here finds a probability of likelihood that, whentext, that is, a character string is given, the character string is amorpheme and the morpheme has the jth (1≦j≦l) grammatical property.

When text T is given, the probability of obtaining an ordered morphemeset M is given by the following equation, assuming that morphemes mi(1≦i≦n) are independent from each other.

$\begin{matrix}{{P( {M❘T} )} = {\prod\limits_{i = 1}^{n}{P( {m_{i}❘( {m_{1}^{i - 1},T} )} }}} & \lbrack {{Equation}\mspace{20mu} 6} \rbrack\end{matrix}$where mi is one of the grammatical properties from 1 to l.

On the other hand, in a dependency model (DM), when text T and anordered morpheme set M are given, the probability of obtaining anordered set D of the dependency relation of every Bunsetsu is given bythe following equation, assuming that the dependency relations d1, . . ., dn are independent from each other.

$\begin{matrix}{{P( {{D❘M},T} )} = {\prod\limits_{i = 1}^{n}{P( {{d_{i}❘M},T} )}}} & \lbrack {{Equation}\mspace{20mu} 7} \rbrack\end{matrix}$

For example, when three keywords “KANOJO KOUEN ITTA” are input and twocandidates “(KANOJOWA(KOUENEITTA))” and “((KANOJONOKOUENE)ITTA)” aregenerated, the dependency model prioritizes one candidate having a morelikely dependency structure over the other candidate.

According to the present invention, using the above-described models,the evaluation unit (7) evaluates the text candidates, such as textcandidates (12) and (13).

Subsequently, a text candidate having a maximum evaluation value, textcandidates having an evaluation value greater than a predeterminedthreshold, or text candidates having top N evaluation values areconverted to surface sentences, which are then output.

The surface sentences are output on a monitor. Alternatively, they maybe output as a voice by a speech synthesizer or may be output as datadelivered to another language processing system, such as a translationsystem.

Thus, when the keywords (2), for example, “KANOJO”, “IE”, and “IKU” areinput, the texts (3) including “KANOJONOIENIIKU” (3 a) and“KANOJOGAIENIITTA” (3 b) can be output. As described above, the texthaving a maximum evaluation value may be output. Alternatively, aplurality of the texts may be output in an evaluation value order. Forexample, a plurality of the texts may be output and an operator whoinputs the keywords may select the most appropriate text among them.

In the above-described embodiment, although some phrases are added tothe front and end of the keyword, the keyword itself (corresponding to ahead word) can be complemented.

For example, in order to complement predicates to “KARE HON” andgenerate “KAREGAHONWOYONDA”, “KAREGAHONWOKAITA”, and “KAREGAHONWOKATTA”,additional keywords can be input to the input keyword.

More specifically, components shown in FIG. 5 are added to the structureshown in FIG. 2. That is, the keywords (2) are also input to adependency-related-word extraction unit (14), which extracts wordshaving a dependency relation to the keywords (2) from the corpus (8).

Subsequently, the words are input as additional keywords along with theoriginal keywords (2) to the Bunsetsu-candidate generation unit (5).

For example, if the corpus (8) does not contain “(KAREGA(HONWO YOMU))”,but contains dependency relations “(KAREGA YOMU)” and “(HONWO YOMU)”,the Bunsetsu-candidate generation unit (5) can generate “YONDA” byadding “YOMU”, which is common to the two dependency relations, as a newkeyword.

Although this structure can decrease the amount of calculation andallows keywords to be added at high speed, the present invention is notlimited to extracting words having a dependency relation from a corpus.Alternatively, any keyword candidates may be added, and texts having thehighest evaluation value by the evaluation unit (7) may be output.

Thus, meaningful texts can be output even though some important wordsthat determine the meaning of the text are not included in the keywords.

In addition, although the generated text is Japanese and the characterunit generated from the text is a Bunsetsu in the above-describedembodiment, the present invention can be applied to any language.

For example, for the English language, a plurality of noun phrases andverb phrases may form other noun phrases and verb phrases, respectively,to form a hierarchy structure. In this type of language, a minimalphrase, “basic phrase”, can be used instead of the Bunsetsu.

TABLE 1 INPUT (KEYWORD) EXAMPLE OF SYSTEM OUTPUT ROSIA RENPA TASSEI(ROSIADE (RENPAWO TASSEISITA)) SEKAI KIROKU KAWARU ((SEKAINO KIROKUNI)KAWARU) KADAI JITUGEN IDOMU ((KADAINO JITUGENNI) IDOMU) CHIIMU TOMONIKATU (CHIIMUDE (TOMONI KATTA)) KONO KEIKAKU KIMERU SENGO SEIKATU SASAERU((SENGONO SEIKATUWO) SASAERU) DAITOURYO KAIKEN CHINJUTSU ((DAITOURYONOKAIKENNO) CHINJUTU) YOUGISHA CHUGOKU TAIHO (YOUGISHAWA (CHUGOKUDETAIHOSARETA)) NIPPON GORIN KAISAI KUNI SEISAKU HOSSOKU ((KUNINOSEISAKUGA) HOSSOKUSURU) KADAI JITSUGEN MUKAU ((KADAINO JITSUGENNI)MUKAU) SHAKAI HATARAKU JOSEI ((SHAKAINI HATARAITEIRU) JOSEI) SHUSHOSEIKEN DAKKAI (SHUSHOGA (SEIKENWO DAKKAISURU)) TEIAN MINAOSHI TSUKURU((TEIANNO MINAOSHIWO) TSUKURU) SEIKEN HOSSOKU KIMARU (SEIKENGA(HOSSOKUNI KIMATTA)) HONKON KURUU OOI (HONKONWA (KURUUMO OOIRASII))TAIKAI SHUTSUJO KATU SHOGATU RYOKOU KYUZOU ((SHOGATUNO RYOKOUMO)KYUZOUSHITEIRU) MINSHUKA HANTAI HITO ((MINSHUKANI HANTAISHITEIRU) HITO)SAKUNEN SEISAKU MITOMERU ((SAKUNENNO SEISAKUWO) MITOMERUBEKIDA) BEIKOKUKATSU AKIRAKADA ((BEIKOKUNI KATEBA) AKIRAKANINARU) JUMIN UTAGAI HIROGARU((JUMINNO UTAGAIGA) HIROGARU) NIPPON CHUGOKU CHIKAI GAIKOKUJIN KA'NYUZOUKA ((GAIKOKUJINNO KA'NYUSHAGA) ZOUKASHITEIRU) GORIN SENSHUKENKAKUTOKU ((GORINNO SENSHUKENWO) KAKUTOKUSURU) KIGYO TEPPAI SUSUMERU(KIGYOGA (TEPPAIWO SUSUMERU)) SHORAI SHINSHINTO UMARERU (SHORAIWA(SHINSHINTOGA UMARERUDAROU)) SAKUNEN KIROKU UWAMAWARU ((SAKUNENNOKIROKUWO) UWAMAWARU) TSUYOI CHIIMU MEZASU YOI SHIGOTO HOSHII

Finally, experimental results using the text generation method accordingto the present invention are shown. In the experiment, combinations ofthree keywords shown in Table 1 were input and the output texts weresubjectively evaluated. The following two criteria for the evaluationwere used:

Criterion 1: If the top candidate is semantically and grammaticallyappropriate, it is determined that the output of the system is correct.

Criterion 2: If a semantically and grammatically appropriate candidateis found among the top 10 candidates, it is determined that the outputof the system is correct.

The evaluation results are shown in Table 2.

TABLE 2 Model Criterion 1 Criterion 2 KM1 (trigram) 13/30 28/30 KM1 + MM21/30 28/30 KM1 + DM 12/30 28/30 KM1 + MM + DM 26/30 28/30 KM2 (backwardtrigram)  6/30 15/30 KM2 + MM  8/30 20/30 KM2 + DM 10/30 20/30 KM2 +MM + DM  9/30 25/30 KM3 (modifying Bunsetsu) 13/30 29/30 KM3 + MM 26/3029/30 KM3 + DM 14/30 28/30 KM3 + MM + DM 27/30 29/30 KM4 (modifiedBunsetsu) 10/30 18/30 KM4 + MM  9/30 26/30 KM4 + DM  9/30 22/30 KM4 +MM + DM 13/30 27/30 KM5 (two modifying 12/30 26/30 Bunsetsus) KM5 + MM17/30 28/30 KM5 + DM 12/30 27/30 KM5 + MM + DM 26/30 28/30

When combinations of two keywords were input, the number of textcandidates generated by the generation rules was an average of 868.8(26,064/30) per combination. When combinations of three keywords wereinput, the number was an average of 41,413.5 (1,242,404/30) percombination.

In Table 2, combinations of one of the above-described keywordgeneration models (KM1 to KM5) and the language models (MM and/or DM)are represented using the sign “+”.

As can be seen from Table 2, the case where the model KM1, KM3, or KM5is combined with the models MM and DM exhibits the best results. Thecases using MM and DM exhibit significantly better results for thecriterion 1 compared to the cases not using MM and DM. This is because arelationship between a verb and a case of the verb is stronger than thatbetween a noun and a case of the noun and the models KM1, KM3, and KM5have learned based on the relationship between a verb and a case of theverb. Therefore, it seems more likely that these models potentiallyorder candidates that generate natural sentences at higher positions.

From the result of the experiment, in the evaluation unit (7) accordingto the present invention, the above-described model KM1, KM3, or KM5 ispreferably combined with the morpheme model and the dependency model,and the model KM3 is the most preferable.

As shown in Table 2, these combinations successfully provide correcttexts at a rate of about 90 percent.

Finally, the text generation methods according to Claims 1 to 5 of thepresent invention can provide a meaningful text that is difficult togenerate by known methods when the number of input keywords isinsufficient.

In particular, since the text generation method according to Claim 2 ofthe present invention can extract words having a dependency relation tothe keywords and can add them as additional keywords, more extensivetext creation is achieved.

Alternatively, since the text generation method according to Claim 4 ofthe present invention can automatically acquire rules for generatingcandidates of a character unit from extracted sentences and phrases, thecandidates of a character unit are efficiently generated, and therefore,the process is speeded up.

Furthermore, the text generation devices according to Claims 6 to 10 ofthe present invention can provide a meaningful text that is difficult togenerate by known methods when the number of input keywords isinsufficient.

In particular, since the text generation device according to Claim 7 ofthe present invention can extract words having a dependency relation tothe keywords and can add them as additional keywords, more extensivetext creation is achieved.

Since the text generation device according to Claim 9 of the presentinvention can automatically acquire rules for generating candidates of acharacter unit from extracted sentences and phrases, the candidates of acharacter unit are efficiently generated, and therefore, the process isspeeded up and the cost is reduced.

As described above, according to the present invention, a textgeneration device that provides an excellent text generation method canbe created and can contribute to improved natural language processingtechnology.

1. A text generation method for generating text of a sentence orparagraph in a predetermined language using a text generation device,comprising: inputting one or more words as keywords into the textgeneration device; generating phrase candidates in the text generationdevice from the keywords by searching for and extracting from a corpussentences or phrases containing each keyword; generating text candidatesof a sentence or paragraph in the text generation device assuming asyntactic dependency relation among the phrase candidates; evaluatingthe text candidates generated in the text generation device; convertingthe text candidates generated in the text generation device to surfacesentences; and outputting at least one of the surface sentences.
 2. Thetext generation method according to claim 1, wherein, in the step ofinputting one or more words into the text generation device, a wordhaving a dependency relation to the input word is extracted from adatabase of predetermined language and input as an additional keyword.3. The text generation method according to claim 1 or 2, wherein, in thestep of generating phrase candidates in the text generation device, acharacter string related to at least one of the keywords is added beforeor after the keyword, and for all of the other keywords, a characterstring is added in the same manner or is not added so as to generatephrase candidates.
 4. The text generation method according to claim 1,further comprising: automatically acquiring a character-unit candidategeneration rule; wherein the phrase candidates are generated using thegeneration rule.
 5. The text generation method according to claim 4,wherein, the sentences or phrases containing each keyword are analyzedby using a morphological analysis and/or a syntactic analysis, and ananalyzed phrase candidate containing the keywords is used as ageneration rule.
 6. A text generation device for generating text of asentence or paragraph in a predetermined language, comprising: inputmeans for inputting one or more words as keywords into the textgeneration device; character-unit candidate generation means forgenerating phrase candidates from the keywords in the text generationdevice by searching for and extracting from a corpus sentences orphrases containing each keyword; text candidate generation means forgenerating text candidates of a sentence or paragraph assuming asyntactic dependency relation among the phrase candidates; evaluationmeans for evaluating the text candidates; means for converting the textcandidates generated in the text generation device to surface sentences;and output means for outputting at least one of the surface sentences.7. The text generation device according to claim 6, wherein, in theinput means, a word having a dependency relation to the input word isextracted from a database of the predetermined language and input as anadditional keyword.
 8. The text generation device according to claim 6or 7, wherein, in the character-unit candidate generation means, meansto add a character string related to at least one of the keywords beforeor after the keywords, and for all of the other keywords, means foradding a character string in the same manner or not adding a characterstring so as to generate phrase candidates.
 9. The text generationdevice according to claim 6, further comprising: generation ruleacquisition means for automatically acquiring a character-unit candidategeneration rule from the extracted sentence or phrase; wherein thephrase candidates are generated using the generation rule in thecharacter-unit candidate generation means.
 10. The text generationdevice according to claim 9, wherein, in the generation rule acquisitionmeans, the sentence or phrase extracted in the extraction means isanalyzed by using a morphological analysis and/or a syntactic analysis,and an analyzed phrase containing the keywords is used as a generationrule.
 11. A text generation method for generating text of a sentence orparagraph in a predetermined language using a text generation device,comprising: inputting one or more words as keywords into the textgeneration device; inputting keywords to a dependency-related wordextraction unit which extracts words having a dependency relation tokeywords in a corpus; subsequently inputting the extracted words alongwith the keywords inputted to the dependency-related word extractionunit to a candidate generation unit; generating phrase candidates in thetext generation device from the keywords by searching for and extractingfrom the corpus sentences or phrases containing each keyword; generatingtext candidates of a sentence or paragraph in the text generation deviceassuming a syntactic dependency relation among the phrase candidates;evaluating the text candidates generated in the text generation device;converting the text candidates generated in the text generation deviceto surface sentences; and outputting at least one of the surfacesentences.